Skip to content

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Oct 8, 2025

Changes

Replaces Vec<u8> with bytes::Bytes for compressed telemetry data to eliminate expensive memory clones in the upload path, especially critical for retry scenarios.

Problem

The upload_batch() method was cloning the entire compressed payload (potentially MBs) on every upload attempt:

// Before: client.rs:141
self.uploader.upload(batch.data.clone(), ...) //  Vec::clone() copies entire 1+ MB buffer

Impact for a 1 MB batch with 3 retry attempts:

  • 3 MB of unnecessary heap allocations (one per retry)
  • ~300 µs wasted on memcpy operations

Solution

Changed EncodedBatch.data from Vec to bytes::Bytes:

  // After: client.rs:141
  self.uploader.upload(batch.data.clone(), ...) // Bytes::clone() is O(1) refcount increment

How it works:

  1. Bytes::from(Vec<u8>) does NOT copy - it takes ownership of the Vec's allocation and wraps it in a reference-counted container (zero-copy conversion)
  2. Bytes::clone() just increments an atomic refcount - no data is copied
  3. Both original and cloned Bytes point to the same underlying buffer

Why Bytes instead of Arc?

While Arc<Vec<u8>> would also provide cheap cloning, Bytes is the better choice for our HTTP upload use case:

  1. Native reqwest integration: reqwest::Body has impl From<Bytes> that wraps the buffer directly without copying (used at uploader.rs:242). With Arc, we'd need to pass .as_ref() (giving &[u8]), which causes reqwest to allocate internally when converting to Body.
  2. Industry standard for async I/O: Bytes is the standard type for byte buffers in Rust's async ecosystem (tokio, hyper, reqwest), ensuring optimal integration.
  3. Future optimizations: Bytes::slice() provides zero-copy views into the buffer, which could be useful for - Optimizing Bond encoding pipeline (bond_encoder.rs, central_blob.rs) if we identify unnecessary intermediate copies during schema/event serialization.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lalitb lalitb requested a review from a team as a code owner October 8, 2025 06:07
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 53.9%. Comparing base (1ee6a9c) to head (7c510f9).

Files with missing lines Patch % Lines
...eneva/geneva-uploader/src/ingestion_service/mod.rs 0.0% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main    #470   +/-   ##
=====================================
  Coverage   53.9%   53.9%           
=====================================
  Files         71      71           
  Lines      11220   11220           
=====================================
  Hits        6057    6057           
  Misses      5163    5163           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants